Coarse annotation of cell-types

In this notebook, we will

  • Perform unsupervised Leiden clustering
  • Annotate major cell-types
  • Compare the cell-type annotations with FACS measurements

Input data

# Parameters
            input_file = "input_adata.h5ad"
            output_file = "adata.h5ad"
            table_dir = "tables"
markers = pd.read_csv(os.path.join(table_dir, "cell_type_markers.csv"))
adata = sc.read_h5ad(input_file)

Leiden clustering

random_state = 42
            sc.pp.neighbors(adata, n_pcs=20, random_state=random_state)
            sc.tl.umap(adata, random_state=random_state)
            sc.tl.leiden(adata, resolution=2, random_state=random_state)
computing neighbors
                using 'X_pca' with n_pcs = 20
            
    finished
            computing UMAP
            
    finished
            running Leiden clustering
            
    finished
            
fig, ax = plt.subplots(figsize=(14, 10))
            sc.pl.umap(
                adata, color="leiden", ax=ax, legend_loc="on data", size=20, legend_fontoutline=3
            )

Visualize cell-type markers

for ct in cell_types:
                marker_genes = markers.loc[markers["cell_type"] == ct, "gene_identifier"]
                sc.pl.umap(
                    adata, color=marker_genes, title=["{}: {}".format(ct, g) for g in marker_genes]
                )

Assign cell types

fig, ax = plt.subplots(figsize=(14, 10))
            sc.pl.umap(
                adata, legend_loc="on data", color="leiden", ax=ax, size=20, legend_fontoutline=3
            )

Assign clusters to cell types using the following mapping:

annotation = {
                "B cell": [17, 4, 1, 28, 6, 7, 19, 8],
                "CAF": [27],
                "Endothelial cell": [21],
                "Mast cell": [32],
                "NK cell": [0, 18, 31, 26],
                "T cell": [2, 9, 20, 14, 24, 3, 10, 16, 12, 11, 15, 30, 5, 13, 25],
                "myeloid": [22],
                "pDC": [33],
            }

Results

sc.pl.umap(adata, color=["cell_type_unknown", "cell_type"])
/opt/conda/lib/python3.8/site-packages/anndata/_core/anndata.py:1192: FutureWarning: is_categorical is deprecated and will be removed in a future version.  Use is_categorical_dtype instead
            
... storing 'cell_type' as categorical
            ... storing 'cell_type_unknown' as categorical
            

display(
                adata.obs.groupby("cell_type")[["samples"]].count().sort_values("samples"), n=50
            )
samples
cell_type
pDC 71
Mast cell 86
CAF 226
myeloid 523
Endothelial cell 534
unknown 662
NK cell 2820
B cell 9132
T cell 14242

Cell-type distribution per sample

<ggplot: (2935020998581)>
display(cell_type_fractions, n=50)
samples facs_purity_cd3 facs_purity_cd56 frac_t_cell frac_nk_cell
0 H68 0.797 0.138 0.843984 0.129240
1 H141 0.288 0.025 0.358058 0.024275
2 H143 0.653 0.008 0.695455 0.002841
3 H149 0.644 0.033 0.750663 0.053935
4 H160 0.342 0.067 0.282983 0.035725
5 H176 0.558 0.108 0.621974 0.125388
6 H182 0.303 0.109 0.391644 0.105428
7 H185 0.493 0.163 0.655858 0.195607
8 H188 0.657 0.087 0.797382 0.073679
9 H197 0.271 0.171 0.416143 0.212788
10 H205 0.485 0.028 0.253820 0.040747
11 H208 0.336 0.323 0.487741 0.295972
12 H211 0.382 0.029 0.248805 0.026621

Compare annotations with FACS markers

Text(0, 0.5, '%T cells')

Text(0, 0.5, '%NK cells')

Save output

adata.write(output_file, compression="lzf")